Unsupervised Word Sense Disambiguation Rivaling Supervised Methods
نویسنده
چکیده
This paper presents an unsupervised learn ing algorithm for sense disambiguation that when trained on unannotated English text rivals the performance of supervised techniques that require time consuming hand annotations The algorithm is based on two powerful constraints that words tend to have one sense per discourse and one sense per collocation exploited in an iterative bootstrapping procedure Tested accuracy exceeds Introduction This paper presents an unsupervised algorithm that can accurately disambiguate word senses in a large completely untagged corpus The algorithm avoids the need for costly hand tagged training data by ex ploiting two powerful properties of human language One sense per collocation Nearby words provide strong and consistent clues to the sense of a target word conditional on relative dis tance order and syntactic relationship One sense per discourse The sense of a tar get word is highly consistent within any given document Moreover language is highly redundant so that the sense of a word is e ectively overdetermined by and above The algorithm uses these prop erties to incrementally identify collocations for tar get senses of a word given a few seed collocations Note that the problem here is sense disambiguation assigning each instance of a word to established sense de nitions such as in a dictionary This di ers from sense induction using distributional similarity to parti tion word instances into clusters that may have no rela tion to standard sense partitions Here I use the traditional dictionary de nition of collocation appearing in the same location a juxta position of words No idiomatic or non compositional interpretation is implied for each sense This procedure is robust and self correcting and exhibits many strengths of super vised approaches including sensitivity to word order information lost in earlier unsupervised algorithms One Sense Per Discourse The observation that words strongly tend to exhibit only one sense in a given discourse or document was stated and quanti ed in Gale Church and Yarowsky Yet to date the full power of this property has not been exploited for sense disambiguation The work reported here is the rst to take advan tage of this regularity in conjunction with separate models of local context for each word Importantly I do not use one sense per discourse as a hard con straint it a ects the classi cation probabilistically and can be overridden when local evidence is strong In this current work the one sense per discourse hypothesis was tested on a set of examples hand tagged over a period of years the same data studied in the disambiguation experiments For these words the table below measures the claim s accuracy when the word occurs more than once in a discourse how often it takes on the majority sense for the discourse and applicability how often the word does occur more than once in a discourse The one sense per discourse hypothesis Word Senses Accuracy Applicblty plant living factory tank vehicle contnr poach steal boil palm tree hand axes grid tools sake bene t drink bass sh music space volume outer motion legal physical crane bird machine Average Clearly the claim holds with very high reliability for these words and may be con dently exploited as another source of evidence in sense tagging One Sense Per Collocation The strong tendency for words to exhibit only one sense in a given collocation was observed and quan ti ed in Yarowsky This e ect varies de pending on the type of collocation It is strongest for immediately adjacent collocations and weakens with distance It is much stronger for words in a predicate argument relationship than for arbitrary associations at equivalent distance It is very much stronger for collocations with content words than those with function words In general the high reli ability of this behavior in excess of for adjacent content words for example makes it an extremely useful property for sense disambiguation A supervised algorithm based on this property is given in Yarowsky Using a decision list control structure based on Rivest this al gorithm integrates a wide diversity of potential ev idence sources lemmas in ected forms parts of speech and arbitrary word classes in a wide di versity of positional relationships including local and distant collocations trigram sequences and predicate argument association The training pro cedure computes the word sense probability distri butions for all such collocations and orders them by the log likelihood ratio Log Pr SenseAjCollocationi Pr SenseBjCollocationi with optional steps for interpolation and pruning New data are classi ed by using the single most predictive piece of disambiguating evidence that ap pears in the target context By not combining prob abilities this decision list approach avoids the prob lematic complexmodeling of statistical dependencies It is interesting to speculate on the reasons for this phenomenon Most of the tendency is statistical two distinct arbitrary terms of moderate corpus frequency are quite unlikely to co occur in the same discourse whether they are homographs or not This is particu larly true for content words which exhibit a bursty distribution However it appears that human writers also have some active tendency to avoid mixing senses within a discourse In a small study homograph pairs were observed to co occur roughly times less often than arbitrary word pairs of comparable frequency Regard less of origin this phenomenon is strong enough to be of signi cant practical use as an additional probabilistic disambiguation constraint This latter e ect is actually a continuous function conditional on the burstiness of the word the tendency of a word to deviate from a constant Poisson distribution in a corpus As most ratios involve a for some observed value smoothing is crucial The process employed here is sen sitive to variables including the type of collocation ad jacent bigrams or wider context collocational distance type of word content word vs function word and the expected amount of noise in the training data Details are provided in Yarowsky to appear encountered in other frameworks The algorithm is especially well suited for utilizing a large set of highly non independent evidence such as found here In general the decision list algorithm is well suited for the task of sense disambiguation and will be used as a component of the unsupervised algorithm below Unsupervised Learning Algorithm Words not only tend to occur in collocations that reliably indicate their sense they tend to occur in multiple such collocations This provides a mecha nism for bootstrapping a sense tagger If one begins with a small set of seed examples representative of two senses of a word one can incrementally aug ment these seed examples with additional examples of each sense using a combination of the one sense per collocation and one sense per discourse tenden cies Although several algorithms can accomplish sim ilar ends the following approach has the advan tages of simplicity and the ability to build on an existing supervised classi cation algorithm without modi cation As shown empirically it also exhibits considerable e ectiveness The algorithm will be illustrated by the disam biguation of instances of the polysemous word plant in a previously untagged corpus STEP In a large corpus identify all examples of the given polysemous word storing their contexts as lines in an initially untagged training set For example Sense Training Examples Keyword in Context company said the plant is still operating Although thousands of plant and animal species zonal distribution of plant life to strain microscopic plant life from the vinyl chloride monomer plant which is and Golgi apparatus of plant and animal cells computer disk drive plant located in divide life into plant and animal kingdom close up studies of plant life and natural Nissan car and truck plant in Japan is keep a manufacturing plant pro table without molecules found in plant and animal tissue union responses to plant closures animal rather than plant tissues can be many dangers to plant and animal life company manufacturing plant is in Orlando growth of aquatic plant life in water automated manufacturing plant in Fremont Animal and plant life are delicately discovered at a St Louis plant manufacturing computer manufacturing plant and adjacent the proliferation of plant and animal life Including variants of the EM algorithm Baum Dempster et al especially as applied in Gale Church and Yarowsky Indeed any supervised classi cation algorithm that returns probabilities with its classi cations may poten tially be used here These include Bayesian classi ers Mosteller and Wallace and some implementa tions of neural nets but not Brill rules Brill STEP For each possible sense of the word identify a rel atively small number of training examples represen tative of that sense This could be accomplished by hand tagging a subset of the training sentences However I avoid this laborious procedure by iden tifying a small number of seed collocations repre sentative of each sense and then tagging all train ing examples containing the seed collocates with the seed s sense label The remainder of the examples typically constitute an untagged residual Several strategies for identifying seeds that require minimal or no human participation are discussed in Section In the example below the words life andmanufac turing are used as seed collocations for the two major senses of plant labeled A and B respectively This partitions the training set into examples of living plants examples of manufacturing plants and residual examples Sense Training Examples Keyword in Context A used to strain microscopic plant life from the A zonal distribution of plant life A close up studies of plant life and natural A too rapid growth of aquatic plant life in water A the proliferation of plant and animal life A establishment phase of the plant virus life cycle A that divide life into plant and animal kingdom A many dangers to plant and animal life A mammals Animal and plant life are delicately A beds too salty to support plant life River A heavy seas damage and plant life growing on A vinyl chloride monomer plant which is molecules found in plant and animal tissue Nissan car and truck plant in Japan is and Golgi apparatus of plant and animal cells union responses to plant closures cell types found in the plant kingdom are company said the plant is still operating Although thousands of plant and animal species animal rather than plant tissues can be computer disk drive plant located in B B automated manufacturing plant in Fremont B vast manufacturing plant and distribution B chemical manufacturing plant producing viscose B keep a manufacturing plant pro table without B computer manufacturing plant and adjacent B discovered at a St Louis plant manufacturing B copper manufacturing plant found that they B copper wire manufacturing plant for example B s cement manufacturing plant in Alpena B polystyrene manufacturing plant at its Dow B company manufacturing plant is in Orlando It is useful to visualize the process of seed de velopment graphically The following gure illus trates this sample initial state Circled regions are the training examples that contain either an a or b seed collocate The bulk of the sample points constitute the untagged residual For the purposes of exposition I will assume a binary sense partition It is straightforward to extend this to k senses using k sets of seeds ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
منابع مشابه
Word Sense Induction and Disambiguation Rivaling Supervised Methods
Word Sense Disambiguation (WSD) aims to determine the meaning of a word in context and successful approaches are known to benefit many applications in Natural Language Processing. Although, supervised learning has been shown to provide superior WSD performance, current sense-annotated corpora do not contain a sufficient number of instances per word type to train supervised systems for all words...
متن کاملLearning Probabilistic Models of Word Sense Disambiguation
This dissertation presents several new methods of supervised and unsupervised learning of word sense disambiguation models. The supervised methods focus on performing model searches through a space of probabilistic models, and the unsupervised methods rely on the use of Gibbs Sampling and the Expectation Maximization (EM) algorithm. In both the supervised and unsupervised case, the Naive Bayesi...
متن کاملSupervised and Unsupervised Word Sense Disambiguation on Word Embedding Vectors of Unambigous Synonyms
This paper compares two approaches to word sense disambiguation using word embeddings trained on unambiguous synonyms. The first one is an unsupervised method based on computing log probability from sequences of word embedding vectors, taking into account ambiguous word senses and guessing correct sense from context. The second method is supervised. We use a multilayer neural network model to l...
متن کاملCombining Supervised and Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation
This work combines a set of available techniques – which could be further extended – to perform noun sense disambiguation. We use several unsupervised techniques (Rigau et al., 1997) that draw knowledge from a variety of sources. In addition, we also apply a supervised technique in order to show that supervised and unsupervised methods can be combined to obtain better results. This paper tries ...
متن کاملCombining Unsupervised and Supervised Methods for PP Attachment Disambiguation
Statistical methods for PP attachment fall into two classes according to the training material used: first, unsupervised methods trained on raw text corpora and second, supervised methods trained on manually disambiguated examples. Usually supervised methods win over unsupervised methods with regard to attachment accuracy. But what if only small sets of manually disambiguated material are avail...
متن کاملKnowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995